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Bayesian ensemble refinement by replica simulations and reweighting 
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We describe different Bayesian ensemble refinement methods, examine their interrelation, and discuss 
their practical application. With ensemble refinement, the properties of dynamic and partially disordered 
(bio)molecular structures can be characterized by integrating a wide range of experimental data, including 
measurements of ensemble-averaged observables. We start from a Bayesian formulation in which the poste¬ 
rior is a functional that ranks different configuration space distributions. By maximizing this posterior, we 
derive an optimal Bayesian ensemble distribution. For discrete configurations, this optimal distribution is 
identical to that obtained by the maximum entropy “ensemble refinement of SAXS” (EROS) formulation. 
Bayesian replica ensemble refinement enhances the sampling of relevant configurations by imposing restraints 
on averages of observables in coupled replica molecular dynamics simulations. We show that the strength of 
the restraint should scale linearly with the number of replicas to ensure convergence to the optimal Bayesian 
result in the limit of infinitely many replicas. In the “Bayesian inference of ensembles” (BioEn) method, 
we combine the replica and EROS approaches to accelerate the convergence. An adaptive algorithm can be 
used to sample directly from the optimal ensemble, without replicas. We discuss the incorporation of single¬ 
molecule measurements and dynamic observables such as relaxation parameters. The theoretical analysis of 
different Bayesian ensemble refinement approaches provides a basis for practical applications and a starting 
point for further investigations. 


I. INTRODUCTION 

The problem of ensemble refinement^ becomes increas¬ 
ingly important as structural biology enters a new era 
in which dynamic and partially disordered biomolecular 
structures come into focusi^^— Such systems play cen¬ 
tral roles in biology, both in functional cellular processes 
ranging from signal transduction to the formation of large 
cellular structures, and in disease, including neurodegen- 
erative diseases such as Parkinson’s and Alzheimer’s. A 
broad range of methods have been developed to refine 
models of (bio)molecular structures against experimental 
data from X-ray crystallography, nuclear magnetic res¬ 
onance (NMR) spectroscopy, electron microscopy (EM), 
solution X-ray or neutron scattering (SAXS, SANS), and 
other methods. By and large, these refinement methods 
operate under the assumption that a single or a few well 
ordered structures should account for all the measure¬ 
ments. However, refinement of a single (or possibly a 
few) copies is not appropriate in systems with significant 
disorder. For unfolded^ or intrinsically disordered pro¬ 
teins (IDP)j^“— such as the a-synuclein peptide involved 
in Parkinson’s disease^^— we expect that a very broad 
range of structures is present in solution. None of these 
structures may individually satisfy all measurements, and 
even if one did, it may be highly atypical. Instead, most 
observables accessible to experiment report on averages 
over the entire ensemble of structures, and as such only 
the appropriate average over a model ensemble should 
match the experiment. 
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FIG. 1. Illustrative comparison of refined probability densi¬ 
ties p{(j>) of a dihedral angle (f from single-copy refinement 
[blue; Eqs. (I2I4|I : every member of the ensemble is expected 
to satisfy the measurement individually] and ensemble refine¬ 
ment [green; Eq. (I21II : the ensemble average is expected to 
satisfy the measurement]. The prior or reference distribution 
(magenta) used in the refinements is bimodal, i.e., with two 
dominant rotamer states. As indicated by the vertical black 
line, the observable is y{(j>) = (j> = 1.287r for single-copy re¬ 
finement and Y = (j) = 1.287r for ensemble refinement, with 
“experimental” error a — O.OJtt in both cases, and 6 = 1. 
Arrows indicate the changes in the relative weights of the two 
rotamers in the optimal Bayesian ensemble refined distribu¬ 
tion. 


Ensemble refinement is a challenging inverse problem 
in which one aims to characterize the high-dimensional 
configuration space of a molecular system on the basis of 
limited experimental information. It is therefore essential 
that ensemble refinement methods can properly integrate 
data from a broad range of experiments^>^d^ that may 
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report on molecular size and shape (e.g., from SAXS, 
SANS, or hydrodynamic measurements^d^d^) , the prox¬ 
imity (e.g., from cross-links) or distance between atoms 
and residues [e.g., from fluorescence resonant energy 
transfer (FRET), NMR,15riJ, including nuclear Over- 
hauser effects (NOE), or double electron-electron res¬ 
onance (DEER) measurement o^dS ] ^ the local chemical 
environment and structure (e.g., from NMR chemical 
shiftsi^ and J-couplingsi2iii or X-ray absorption spec¬ 
troscopy), all the way to measures of the global struc¬ 
ture (e.g., from X-ray crystal diffraction or electron 
microscop y^d^d^ ). Taking into account the uncertainties 
of the different experiments^^ is critical for the construc¬ 
tion of a properly weighted configurational ensemble. 

Inverse problems are typically ill-conditioned, i.e., sen¬ 
sitive to input parameter variations, and underdeter¬ 
mined. Such problems with high sensitivity and low 
data-to-parameter ratios are usually tackled through reg¬ 
ularization, for instance by assuming near-uniform and 
smooth solutions. Bayesian statistics offers a particu¬ 
larly elegant route for the inference of probabilistic mod¬ 
els from data (see, e.g.. Ref. [U for a general overview 
and Ref. for a pioneering application to biomolecular 
studies). In effect, the assumed prior distributions of the 
model parameters serve as regularizing factors, 

p(model|data) oc p(data|model)po(model), (1) 

written as a proportionality without the normalizing fac¬ 
tor. p(model|data) is the posterior distribution of the 
model, and po(model) is the prior that expresses our ex¬ 
pectations on the model and its parameters in the ab¬ 
sence of new data. p(data|model) is the conditional prob¬ 
ability of observing the data given the model, which for 
given data is the likelihood of the model. Consequently, 
we will in the following refer to p(data|model) as the like¬ 
lihood function. Importantly, in the absence of new data 
(or for non-informative data), one simply recovers the 
prior. 

The importance of ensemble refinement is best illus¬ 
trated by a simple example that anticipates some of the 
theoretical developments in this work. Figure [T] contrasts 
the stark differences in the results for single-copy and en¬ 
semble refinements of a simple model system with a prior 
or reference distribution with two dominant rotamers. 
In single-copy refinement, we determine how well each 
dihedral angle (j) individually agrees with the observa¬ 
tion = I.287r. This posterior probabil¬ 
ity p((^|data) = is concentrated in a sharp 

peak around the target value. By contrast, in ensem¬ 
ble refinement we seek a probability density p{4>) that is 
consistent with the observed average = (j). This 

p(())) = p*^°P*)(())) retains the character of the reference 
distribution that reflects the underlying physics, while 
redistributing some population from one rotamer to the 
other. 

Here, we will describe both formal and practical ap¬ 
proaches toward inferring ensemble distributions from 
diverse data. We will formulate the ensemble refine¬ 


ment problem first formally in a Bayesian framework 
in which the posterior is a functional that quantifies 
the relative probability of different ensemble probabil¬ 
ity densities p(x) for configurations x. Experimental 
uncertainties^^ are taken into account from the outset, 
which allows us to combine data from a variety of mea¬ 
surements. We then study algorithms to realize Bayesian 
ensemble refinement in practice. First, we will describe a 
method with which existing ensembles can be reweighted 
to match experiment. By variational maximization of the 
Bayesian posterior functional over the ensemble proba¬ 
bility densities p(x), we will derive an optimal Bayesian 
ensemble density p(°P*)(x), Eq. (1^ . for the continuous 
case and Eq. (1^ for the discrete case. Applied to sub¬ 
ensembles drawn according to the prior, this reweighting 
method turns out to be equivalent to the maximum en¬ 
tropy refinement procedure in the ensemble refinement of 
SAXS (EROS) methodic (which is different from the “en¬ 
semble refinement with orientational restraints” method 
with the same acronymi^). In the limit of infinite sam¬ 
ple size, the reweighting method converges to the optimal 
Bayesian ensemble refinement. Then, we will describe a 
Bayesian replica ensemble refinement method to perform 
ensemble refinement on the fly by running molecular sim¬ 
ulations of identical copies of the system with a bias on 
the averages calculated over these replicas. If the bias¬ 
ing potential is proportional to chi-squared (as twice the 
negative log-likelihood for Gaussian errors) scaled by the 
number of replicas N [see Eq. (HH)], one recovers the opti¬ 
mal Bayesian ensemble refinement [Eq. (j2ip ] in the limit 
of an infinite number of replicas. At the other extreme, in 
the limit of a single replica, the common-property refine¬ 
ment [Eq. (1121) ] is recovered, in which every member of 
the ensemble is expected to satisfy the measurements in¬ 
dividually, not just in the ensemble average. To speed 
up the convergence to the optimal Bayesian distribu¬ 
tion with increasing number of replicas N, we show how 
EROS and replica refinement (as well as other ensemble- 
biased simulation methods) can be combined with the 
help of free-energy reweighting methods, resulting in the 
“Bayesian inference of ensembles” (BioEn) method. An 
adaptive algorithm designed to sample directly from the 
optimal ensemble distribution, without multiple replicas, 
is presented in an Appendix. 

To illustrate the formal theory and the practical replica 
simulation approaches, we will introduce analytically or 
numerically tractable models of ensemble refinement. 
The solutions obtained for these models allow us to assess 
the mutual consistency of the methods, and to demon¬ 
strate the need for a size-consistent treatment in the 
Bayesian replica ensemble refinement with respect to the 
number of replicas. We also sketch how dynamic proper¬ 
ties can be integrated in ensemble refinement, albeit ap¬ 
proximately, and how the parameter expressing the con¬ 
fidence in the reference ensemble distribution can be cho¬ 
sen. We conclude by a summary of the main results and 
a discussion of possible applications, including the opti¬ 
mization of potential energy functions used for molecular 
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simulations. and ai is the standard deviation of measurement i. For 

simplicity, we assume in Eq. o that the errors in the 
different measurements i are uncorrelated. In the more 
II. THEORY general case of correlated errors, one can use 


A. Bayesian single-copy refinement in configuration space 


X^(x) = (5y^(x)S My(x) (5) 


Before venturing into ensemble refinement, we intro¬ 
duce notation and the general framework in the context 
of the more familiar single-copy refinement. Here one 
assumes that a single configuration can explain all mea¬ 
sured data. Different configurations can then be ranked, 
in a probabilistic manner, by their respective abilities to 
do so. 

In the following, we will use x to denote individual 
configurations. In a typical application to a molecu¬ 
lar system, x could be the 3n-dimensional vector x = 
{ri, r 2 ,..., r„} of the Cartesian coordinates of the n 
atoms. In single-copy refinement, we assume that one 
would ideally (i.e., without error) measure values 2 /i(x) 
of observable i, with i = 1,2,..., M, for a given config¬ 
uration X. The actual values observed (measured) are 
^(obs)^ By contrast, in ensemble refinements described 
below, the measured values of the observables will instead 
depend on the distribution over the entire configuration 
space, not just on a single configuration x. 

Using a reference distribution po{'x.) as a prior, in 
single-copy refinement we want to construct a poste¬ 
rior p(x|data) in configuration space that ranks config¬ 
urations X by their consistency with both experimental 
measurements and prior. The prior po(x) could for in¬ 
stance be the Boltzmann distribution for a simulation 
model described by a particular potential energy function 
C/(x), i.e., po(x) = exp[—/3I7(x)]/ f dx'expj—/3C/(x')] at 
reciprocal temperature /3 = l/ksT with ks the Boltz¬ 
mann constant and T the absolute temperature, or a 
statistical distribution of conformers of the Protein Data 
Bankj ^^d^ The normalized posterior distribution accord¬ 
ing to Eq. m is then 




Po(x)p({;/f^"^}|x) 

/ dx'po(x')p({2/i°'""^}|x0’ 


( 2 ) 


where p({y|°^'*^}|x) is the likelihood of x for data 
given as a set of M measured values, = 

The posterior p(x|{j/f‘’"^}) 
gives the probability density that configuration x is the 
single configuration underlying the data. 

In cases where the statistical errors are Gaussian, we 
define the likelihood function is 




( 3 ) 


where Sy is a vector of deviations, with elements 6yi{x.) = 
yi{x) — and S is the symmetric covariance ma¬ 

trix of the statistical errors (where for uncorrelated errors 
Uii = af and Ey — 0 lor i ^ j). Note that the measure¬ 
ments i can be from different measurements (say, NMR 
and single-molecule FRET) or from the same measure¬ 
ment (say, intensities at different wave vectors in a SAXS 
measurement). 

In practice, single-copy Bayesian refinement can then 
be performed by sampling directly from the poste¬ 
rior p(x|data), e.g., by running equilibrium simulations 
with an effective energy function C7eff (x) = U (x) — 
lnp(data|x). Alternatively, representative configura¬ 
tions can first be sampled from the reference distribution 
Po(x) and then reweighted by the likelihood according to 
Eq. dal). 


B. Bayesian ensemble refinement in probability density 
space 


In an alternative Bayesian formulation, we think of 
p(x) not as a posterior p(x|data) ranking individual con¬ 
figurations X with respect to their mutual consistency 
with prior and data, but as an actual probability den¬ 
sity of X in configuration space defining an ensemble. 
As a consequence, prior, likelihood, and posterior be¬ 
come functionals of the probability density p(x) in con¬ 
figuration space. We note that such “hyperensembles” 
have been studied by Crooks as models of nonequilibrium 
statesi^ Functional approaches are also used in varia¬ 
tional Bayesian methodsj^i 

To construct a prior in the space of probability den¬ 
sities p(x), with p(x) > 0 and J dx.p{x.) = 1, we use 
the Kullback-Leibler divergence or relative entropy with 
Po(x) as reference distribution (i.e., po(x) no longer is 
the prior, but defines the prior). We note that other 
measures of the difference between distributions could 
be used to regularize the Bayesian refinement. The rela¬ 
tive entropy provides us with a positive-definite measure 
of deviation between p(x) and the reference distribution 
Po(x). By weighting these deviations exponentially, we 
arrive at a prior functional 


7^0 [p(x)] oc exp 



dxp{x.) In 


P(x) \ 

Po(x)/ 


( 6 ) 


where 


M 


x“(x)=i: 

i=l 



with a parameter 0 > 0 expressing the level of confidence 
in the reference ensemble, and therefore in the under¬ 
lying potential energy surface (force field) and the ex¬ 
haustiveness of our sampling of po(x). High confidence 
is expressed through large values of 6. The choice of the 
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confidence factor 9 will be discussed in the section on 
Practical Considerations below. Here and in the follow¬ 
ing, we use a calligraphic font for functionals, and square 
brackets for their arguments. The posterior functional 
then becomes 


P[p(x)|data] 


oc exp 0 J c?xp(x) In'P[data|p(x)]. (7) 


In the following, we will first consider the case where 
the measured observables are properties common to all 
configurations before considering the case where the ob¬ 
servables are ensemble averages. 

a. Ensemble refinement for properties common to 
all configurations. In some cases, ensemble refinement 
should be used even if the observables are properties of in¬ 
dividual configurations. As an example, consider a disul¬ 
fide bond or other chemical cross-link that is present in 
essentially all proteins within a system. With respect 
to other degrees of freedom, the configurations may be 
disordered. Such cases require ensemble refinement, but 
with an experimental restraint that acts on each ensem¬ 
ble member individually. 

To quantify deviations from the observations, we use 
an approximate likelihood functional. For given p(x) and 
Gaussian errors <7i, the probability of the data is pro¬ 
portional to n*/ c^xp(x)exp(-[?/j(x) - ss 

exp(- / dxp(x) ), ignoring higher- 

order fluctuations in the squared errors. With this ap¬ 
proximation, we arrive at 

7^[data|p(x)] = (8) 


where 


X^[p(x)] dxp{x) 


/ N (obs) 

y^m - Vi 


( 9 ) 


is the mean-squared error of the common observables 
?/i(x), scaled by 1/af. 

To make progress, we now determine the normalized 
probability density p(°P‘^(x) that maximizes the posterior 
functional 'P[p(x)|{ 2 /|°^'*^}]. We define 


'CWx)] = -lnT’[p(x)|{ 2 /f’'"^}] A J dxp(x) 

= 0 y dxp(x) In 


/ dxp{x) 


Po(x) 

/ N (obs) 

y^m-yi 


( 10 ) 


2a/ 


-I- A y dxp(x). 


where the Lagrange multiplier A is used to ensure nor¬ 
malization, f dxp(x) = 1. £ trades off deviations of 

p(x) from the reference distribution against deviations 


between the predicted and measured observables. Set¬ 
ting the functional derivative with respect to p(x) to zero 
results in 


S£ 

Sp(x) 

+E 


= 9 


In 


P(x) 

Po(x) 


[2/i(x) 


di 


(obs) 


1 2 


2a? 


-k A = 0. 


( 11 ) 


By solving this equation for p(x) = p^°P*\x), we obtain 
an explicit expression for the optimal probability density 
in common-property ensemble refinement. 


p(opt)(x) (x po(x) exp 


-E 


[y*(x) 


■di 


(obs) 


20a? 


( 12 ) 


which can then be normalized to one by integration over 
X. We note that for 0 = I, this distribution is identi¬ 
cal to the Bayesian posterior of single-copy refinement in 

Eqs. ( 1311 ) . 

This procedure is closely related to the maximum- 
entropy method. In typical maximum-entropy ap¬ 
proaches, measurements are imposed as strict con¬ 
straints. By contrast, in the maximum-entropy formal¬ 
ism of Gull and Daniell,^^ noise is taken into account 
through a term. However, enters in the form of 
a constraint to match exactly an “expected value”, and 
0 is the corresponding Lagrange multiplier enforcing this 
constraint. Here, by contrast, we have no a priori expec¬ 
tations concerning the exact to be achieved in refine¬ 
ment. Instead, we express our confidence in the reference 
distribution through the choice of 0 (even though in prac¬ 
tice, 0 may be adjusted; see below). We note that later 
maximum entropy approaches accounting for noise in the 
data do not always draw this distinctio n^^i^^i^^ and min¬ 
imize functionals similar or identical to £. 

b. Refinement using ensemble averages. Next we as¬ 
sume that the measured quantities are averages of 

observables yi{x) over an ensemble of structures, as rep¬ 
resented by the functional 

3^4p(x)] = y dxp{x)yfix). (13) 


For simplicity and concreteness, we assume Gaussian er¬ 
rors and a likelihood functional correspondingly defined 
as 

iP[data|p(x)] (14) 


where 


X^[p(x)] 


E 


[/dxp(x)y,(x) 


(15) 


for a set of measurements i of ensemble-averaged observ¬ 
ables yfix). For correlated errors, Eq. m becomes 

y2[p(x)] = SY^i:-^SY 


(16) 






















5 


with 5Yi = J dxp(x)?/i(x) — We note that the 

general formalism is of course not limited to Gaussian 
errors. Substituting —2 ln7^[data|p(x)] for will lead to 
the corresponding expressions for more general likelihood 
functions. We note further that more general functionals 
can arise, e.g., if the measurements report on 

functions of averages, with measurements of the variance 
as the simplest case. 

In practice, one also has to deal with uncertainties 
calc ™ forward calculation of the observables 2 /i(x) 
from individual configurations x. Such uncertainties of¬ 
ten exceed the statistical errors in the measure¬ 

ments. Assuming that the two are uncorrelated, they can 
be lumped together, af = + ^i,ohs- Finally, both 

errors can only be estimated with some uncertainty. In a 
Bayesian formulation, errors can be treated as nuisance 
parameters and integrated out^Sl 

c. Sampling from the Bayesian posterior functional 
in ensemble refinement. The above formulation appears 
to be of limited practical value, as one would have to sam¬ 
ple in function space. One possible way to perform such 
sampling in practice is to discretize the problem. For 
instance, clustering can be used to break up the config¬ 
uration space into discrete subsets. If a set of configura¬ 
tions is drawn from the reference distribution po (x), the 
relative weight of each cluster a would then be pro¬ 
portional to the number of its members. For cluster a, 
the value for the observable i is yf, such that Eq. m 
becomes 


X^[wi,W2, ■. ■,wn] = ^ 


(s 


N t 


_ y (obs)y 


(17) 

with normalized weights Wa- These weights could then 
be sampled according to 


7^[wi,w; 2, ■ ■ • oc (18) 


exp 




V 


again under the normalization constraint, = 1- 

Equation (fTSl) is the discrete analog of Eq. ([7]). This 
form of Bayesian ensemble refinement can also be applied 
to a collection of N individual configurations, without 
clustering. If one starts from an equilibrium ensemble 
of Xq, drawn from the reference distribution po(x), then 
= l/N. 

d. Optimal configuration space distribution from 
Bayesian ensemble reweighting. Instead of sampling the 
probability densities p(x) or {wi,W 2 , ■. ■ ,wn} from the 
posterior functional, we can again try to find the most 
probable p(x) or Wa, as in Eqs. (11011121) above. Configura¬ 
tions X sampled according to this optimal p*^°P*^(x) define 
representative ensembles. To find the extremum of the 
posterior functional, we follow the same variational ap¬ 
proach as above and maximize the posterior functional in 


Eq. d?]) with respect to the probability density p(x). As 
optimization function, we use the negative logarithm of 
the posterior V, with a Lagrange multiplier A to enforce 
normalization. For Gaussian errors, we obtain 


£[p(x)] — ^ J dxp{x) In 

y dxp{x)y,{x) -Y. 


P(x) 

Po(x) 


(obs) 


E 


2a/ 


+ A 


(19) 


dxp(x), 


taking on a form that has been postulated as a start¬ 
ing point for a maximum entropy approachi ^^i^^ Here, 
Eq. (1191) is a direct consequence of posterior maximiza¬ 
tion, which would allow us to obtain corresponding log- 
posteriors also for more non-Kullback-Leibler priors and 
more complicated likelihood functions [e.g., for rigorous 
common-property refinement without the approximation 
preceding Eq. ([5])]. In such cases, postulating a proper 
maximum entropy formulation can be difficult. Varia¬ 
tional optimization of C results in 


SC 

Sp{x) 


In 


p(x) 

Po(x) 


-f 1 


( 20 ) 


yi(x)[/ dx'p{x')y,{x') - Y. 


(obs)i 


A = 0, 


a. 


which can be solved formally to give 

p(°P*)(x) (X (21) 

y.(x) [/dx'p(°P‘)(x')yz(x') - 


Po(x)exp 


-E' 


eaf 


We recognize Eq. (8) of Ref. [l|, albeit with a some¬ 
what different interpretation. There, 1/9 appears as a 
Lagrange multiplier “A” that has to be determined self- 
consistently such that the for p(°P‘)(x) matches a de¬ 
sired value, following the maximum-entropy prescription 
of Gull and Daniell;^'^ here, 0 is a parameter that ex¬ 
presses a priori the confidence in the reference distribu¬ 
tion. The normalization factor in Eq. m can be deter¬ 
mined by integration [which is equivalent to determining 
our Lagrange multiplier A in Eq. mi For correlated 
errors of the ensemble averages, Eq. m, the exponent 
in Eq. m should be replaced by 


( 22 ) 

Because the weight function pGpt)(x) appears inside the 
square in the term of Eq. (HU), we have ended up 
with a nonlinear integral equation, Eq. (ICT) , for p(°P*\x) 
that will usually be difficult to solve, in particular for 
high-dimensional problems. We note, however, that for 
refinement without explicit consideration of errors, adap¬ 
tive methods have been developedj^^— Uncertainties are 
considered by Beauchamp et al.^ albeit with a num¬ 
ber of additional priors introduced for constants acting 


-i^y.(x)(S-i)., 
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as weight factors in their bias. In Appendix we in¬ 
troduce an adaptive algorithm to sample configurations 
according to the optimal Bayesian ensemble distribution, 
Eqs. (I?n) and (1^^ . without the need of multiple replicas. 

The above procedure can also be applied to problems 
with a set of N discrete configurations. We determine 
their optimal weights a = 1,... ^ N, by maximiz¬ 

ing the negative log-posterior 


C{wi,.. .,wn) = 6 » Wain— ^ 

a. ^ 

(Ea W;a 2 /*(Xa) - 


(23) 


E- 


2a2 


AE 


Wo 


The extremum of this negative log-posterior satisfies the 
following set of coupled nonlinear equations 


oc 


Wa exp 


-E- 


»(Xa) (E 


(24) 


(opt) 

^ ^7 ' 


yi(x^) - 


(obs) 


Baf 


which can be solved, for instance, by iteration, starting 
from Wa (see below). Alternatively, one can use simu¬ 
lated annealing or other optimization methods to locate 
the global minimum of C. 

We note that the resulting optimal weights co¬ 

incide exactly with the EROS weightsi^ if all N configu¬ 
rations are reweighted. In an illustrative example below, 
we will also consider the case where sets of n configura¬ 
tions are drawn from po(x) and reweighted according to 
EROS. In the limit of n —oo, each structure enters this 
starting ensemble with the correct relative weight. After 
EROS reweighting, using Eq. (1211) with w° = 1/n, one 
thus converges to the optimal Bayesian ensemble refine¬ 
ment weights for n —>■ oo. This convergence will 

be illustrated in a numerical example. 


C. Bayesian ensemble refinement in configuration space: 
The replica method 


We require that for a single replica, A^ = 1, one recov¬ 
ers the result of common-property ensemble refinement, 
Eq. (1121) . At the other extreme, N —)■ oo, we want to 
recover the optimal Bayesian configuration space distri¬ 
bution, Eq. (ED. 

We use N equally weighted replicas xi,..., xjv to de¬ 
fine a function space of realizable probability densities, 
p(x) — N~^ Ea=i ~ where S(x) is Dirac’s delta 
function. To determine the relative weight of these p(x), 
and in turn of the underlying replica states {xq,}, we use 
the posterior functional Eq. dzD with the likelihood in 
Eq. mD, 


'T~i \ ( \lJxl — ^ f p(^) \ —/2 

7^[/?(x)|data] oc e J ^ ^ ' 

oc ewEc^"Po(’^“)“x72 


(25) 


= n [Po(Xa)] 




For the evaluation of the entropy integral in the ex¬ 
ponent, we coarse-grained 6(x) as 1/A for |a:| < A/2 
and 0 otherwise; divided out a term proportional to 
In A because we only require relative posterior proba¬ 
bilities; and then took the limit A ^ 0. Having chosen 
A-replica distributions as function space, the p(x) are 
now parametrized by {x^}, and in Eq. (|2^ the posterior 
functional has become a function that can be interpreted 
as the sampling distribution of the replica states {x^}. 
Here, we are interested in sampling from the extremum 
of the posterior functional, i.e., the optimal Bayesian en¬ 
semble distribution. To suppress fluctuations around the 
extremum as N oo, we take the posterior function in 
Eq. (1251) to a power growing with JV. Taking it to the 
power JV/B, we arrive at the replica sampling distribution 


N 


P]v(xi,X2,...,xjv) (X ]^po(xa)exp(-Ax^/26() = (26) 


exp 


N 


N ■ 


-f 3 j 2 uixo.)--Y: 


EE, 

N 


-E 


(obs) 


0=1 


Baf 


The optimal weights determined self-consistently 
from Eq. dMD can be used for reweighting of an ensem¬ 
ble of structures drawn from the reference distribution. 
However, we cannot use these weights directly to sample 
the ensemble of configurations on the fly, lacking explicit 
solutions of Eqs. ED and (12^ (but see the Appendix for 
an adaptive method). 

To circumvent the problem, we adopt a replica-based 
approach in which averaged observables are calculated 
over multiple copies of the system, In the replica 

simulations, N copies (replicas) x^ of a molecular system 
are simulated in parallel using the same energy function 
U{xa), subject in addition to a biasing potential that 
attempts to match the observables obtained by averag¬ 
ing over the N copies to the experimental measurements. 


with the Boltzmann factor for potential energy U (x) 
defining the reference distribution. The second term in 
the exponent defines the biasing potential applied to the 
ensemble of replicas. 

We now show that under Eq. dUD in the limit N ^ oo, 
individual replicas indeed sample configurations accord¬ 
ing to the optimal Bayesian ensemble refinement distri¬ 
bution in Eq. ED- Without loss of generality, we deter¬ 
mine the distribution of replica I, since all replicas are 
equivalent. To this end, we rewrite the last term in the 
exponent of Eq. dUD as 
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= E 

i 

+E 




N 


0a? 


N 


J2yd^c,)-Y- 


(obs) 


q ;=2 


20 a? N 

20a? 


(27) 


Since the first term on the right is of order 0(1/N) and 
the second term is of order 0(1), the first term vanishes in 
the limit of IV —^ oo. In this limit, we can use a mean field 
approximation for the second term, ~ 

f dxp{x)yi{x). The last term on the right of Eq. (I?7l) is 
independent of xi and thus cancels in the normalization 
of the resulting distribution over Xi. In the limit of TV ^ 
00 , we thus arrive at a probability density for replica I 
(and, by symmetry, for all others) of 


p(xi) oc 
Po(xi)exp 


(28) 


-E- 


y,(xi) / dx'p(x')y*(x') - Y? 


(obs) 


0a? 


On the basis of the preceding analysis, we note that if 
the term in Eq. (HH) were scaled by iV“ instead of N, 
with a > 0, then replica ensemble refinement would ex¬ 
hibit a “phase transition” as a function of the exponent a 
in the “thermodynamic limit” of infinitely many replicas, 
N —>■ oo. For sub-linear scaling, 0 < a < I, the effect of 
the x^ bias vanishes with increasing N and the replica en¬ 
semble gradually falls back to the reference distribution; 
for super-linear scaling, a > I, the x^ bias diverges to 
infinity everywhere except at states that satisfy the con¬ 
straints exactly, making it equivalent to a sum of delta 
functions that impose strict constraints on the averages; 
only for linear scaling, a = 1, replica sampling converges 
to the distribution of optimal Bayesian ensemble refine¬ 
ment. This iV-scaling becomes explicit in the Gaussian 
models studied by Roux and Weare^^ and below. 


D. BioEn method combining replica simulations with 
EROS 


which is indeed identical to the probability density of 
optimal Bayesian ensemble refinement in configuration 
space, Eq. (EJ. Below, this identity will be demonstrated 
explicitly for two analytically tractable models, and for 
a numerical model. 

Equation (I26p for the probability density in Bayesian 
replica ensemble refinement is nearly identical to that ob¬ 
tained by Cavalli et ah— as a weighted integral over 
the maximum entropy solution with strict constraints on 
the observables. However, there is one crucial difference: 
their x^ in the exponent of the reweighting factor is miss¬ 
ing the factor N scaling the biasing potential with the 
number of replicas. Not scaling x^ by N would result 
in decoupling of the replicas, as shown explicitly below. 
Indeed, early replica ensemble-refinement simulations in¬ 
troduced the scale factor N empirically;^ and it appears 
in a recent preprint^! released shortly after submission of 
this paper and release of a preprint. 

Roux and Weare^ also considered a maximum entropy 
approach with strict constraints on the ensemble aver¬ 
ages. In addition, these authors examined the conver¬ 
gence behavior of A^-replica simulations. For the specific 
example of a Gaussian reference distribution and a har¬ 
monic restraint on the mean, Roux and Weare^^ found 
that to recover the mean exactly for large N, the effec¬ 
tive spring constant in the biasing potential had to grow 
faster than linearly in N. For the general case, the choice 
of the spring constant was left open. In our Bayesian 
formulation, we account for the uncertainties of the mea¬ 
sured averages. It is therefore not to be expected that 
the measurements are satisfied strictly in the refined en¬ 
semble. This will be illustrated below by the analytical 
solution for the analogous problem of a Gaussian refer¬ 
ence distribution within our Bayesian framework. More 
generally, the explicit accounting for errors ai provides a 
basis for combining different measurements in a properly 
balanced manner. 


In the following, we describe the BioEn algorithm 
that simultaneously addresses the possible shortcomings 
of EROS and Bayesian replica ensemble refinement and 
helps us in the choice of the 0 parameter. By combin¬ 
ing EROS and replica simulations, one can accelerate 
the convergence toward the optimal Bayesian ensemble. 
This combination also makes it possible to obtain optimal 
Bayesian ensembles for a wide range of 0 values without 
the need to run actual Bayesian replica simulations for 
all of them. Covering a broad 0 range is important in 
practice to choose a suitable confidence parameter 0 that 
achieves a good balance between reference distribution 
and data. 

In EROS, one can work with large numbers n of struc¬ 
tures without significant computational costs; however, 
if po (x) and p(°P*)(x) have little overlap in configuration 
space, then these structures may not be representative of 
the refined ensemble, resulting in slow convergence with 
increasing n, as shown below. By contrast, the computa¬ 
tional cost of sampling Bayesian replica ensembles with 
large N is high. To accelerate the convergence toward N- 
independent optimal Bayesian ensemble refinement, one 
can combine the replica and EROS methods (and the 
adaptive method described in the Appendix). Even with 
relatively small the replica simulations can be used to 
enrich the sample of configurations fed into EROS refine¬ 
ment. To give these configurations x the proper weight 
proportional to po (x), with x coming from different sim¬ 
ulations with and without bias, one can for instance use a 
histogram-free version of the multidimensional weighted 
histogram analysis method (WHAM)i^“— 

We first need to reweight the A^-replica states in dif¬ 
ferent simulations according to the reference distribu¬ 
tion. By running N unbiased, uncoupled simulations 
according to po(xi)po(x 2 ) • ■ •Po(xAr) (or, simply, one un¬ 
biased simulation as a source for configurations x^ that 
are then combined at random to form pseudo A^-replica 












states) and one or several biased, coupled simulations 
(e.g., with different 9i) according to P 7 v(xi,..., x^v), one 
obtains representative sets of fV-replicas states. To com¬ 
bine them, one needs to assign the proper relative weight 
to the fc-th sampled 7V-replica state {'x.a}i,k in run i, 

as given by the reference distribution na=i^'o(xQ). Fol¬ 
lowing Ref. we first determine the free energies Fi 
of each A^-replica simulation i {i = 1,, Mrun) by iter¬ 
atively solving the coupled set of equations 


= -/3F _ 


Mrun 

= EE 






(29) 


where Fi = 0 by definition. The outer sums on the right 
extend over the runs (indexed by m and j) and 

the rim iV-replica states (indexed by k) in run m. The 
biasing potential is defined as t/^dx^}) = Nx^l‘2di in 
biased runs i, and Ui = 0 in unbiased ones, with as 
in Eq. (051) . To obtain the relative weight of replica 
state k in run i, {xali^fc, corresponding to the reference 
distribution, we set the d-term in Eq. (2) of Ref. Unequal 
to one for only this replica state and to zero for all others. 
We then obtain 


w 


0 

i,k 


OC 


Mrun 

i=i 


-1 


(30) 


This problem is closely related to a Gaussian model for 
replica simulations studied by Roux and Weare,— with 
the difference that here we explicitly account for the un¬ 
certainty a in the measured mean Y. The negative log- 
posterior Eq. becomes 


C[p{x)] = 0 / dxp{x)\n 


p{x) [Jdxp(x)x -Yy 
Poix) 2a^ 


-l-A / dxp{x). 


(33) 


The extremum of C satisfies 


6C 

8p{x) 


= 9 


In 


Pjx) 

Po{x) 



X [/ dx' p{x')x' “ A"] ^ 


(34) 

This integral equation can be solved with a Gaussian 
ansatz, p{x) = (27rs^)“^/^ exp[—(a; — /r)^/(2s^)], with 9 
set to one without loss of generality, since a change in 
9 here corresponds to a rescaled (see the Appendix 
for an alternative solution method using generating func¬ 
tions). By substituting the ansatz into the integral equa¬ 
tion and solving for the coefficients of powers of x, it 
follows that the mean of the optimal probability density 
is 


p = Y /{s^ + ay. (35) 


where the sum extends over the different simulations j. 
Each of the N configurations x^ in a given replica state 
{xa}i,fe then has the same relative weight as the 
replica state. The resulting set of configurations together 
with their estimated relative weights can then be used 
as input for an EROS refinement according to Eq. ([24|) . 
With the resulting EROS-refined weights, one obtains the 
BioEn ensemble of configurations enriched by the biased 
Wreplica simulations, yet properly reweighted to correct 
for effects of finite numbers N of replicas. Importantly, 
this reweighting approach also allows one to obtain op¬ 
timally reweighted ensembles for different 9, simply by 
re-running EROS. 


III. RESULTS AND DISCUSSION 

A. Illustrative examples of ensemble reweighting 

e. Ensemble reweighting of the mean. To illustrate 
and test the ensemble reweighting formalisms described 
above, we consider a simple, analytically tractable prob¬ 
lem. Gonsider a one-dimensional configuration coordi¬ 
nate X with a Gaussian reference distribution 

Po{x) = (27rs^) (31) 

and the mean Y = x as observable, with uncertainty ct, 
such that 


The optimal probability density of Bayesian ensemble re¬ 
finement is thus a Gaussian, 


p(opt)(a;) = (27rs^) ^^^exp 



Vs^ A 


2s2 


21 


(36) 


The variance remains unchanged from the reference 
distribution, but the mean is shifted from zero toward the 
ensemble average Y according to the relative weights of 
the variances in the reference distribution, s^, and in the 

error, tr^. In the limit ct <C s, the mean approaches the 
measurement Y ; in the opposite limit a s, the mean 
remains near that of the reference distribution, i.e., at 
zero. 

Bayesian replica ensemble refinement for this problem 
is also analytically tractable. For N replicas, with 9 = 1, 
we have 


exp 


Pn{xi, .. .,xn) oc 




~2s^ 






(27rs^)^/^ 


. . 

Since this replica probability density is symmetric in ex¬ 
changes of the Xi, all replicas sample the same space, and 
we can integrate out all Xi but xi to obtain a marginal¬ 
ized replica probability density 


2 [/ dxp{x)x — F]^ 


(32) 


qN{xi) = J dX2 dX3 


■■dxNPN{xi,...,XN)- 


(38) 
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The Gaussian integrals can be carried out, resulting in 
qN{xi) being Gaussian with mean Fs^/(s^ + cr^) and 
variance [l — s^/-/V(s^ + cr^)]. For = 1, we re¬ 
cover the probability of common-property refinement, 
Eq. ([12]), with variance l/(s“^ -I- cr“^). In the limit of 
N ^ oo, the variance approaches We thus have 
limAr_).oo 9iv(a:) = with p^°P^\x) the optimal 

Bayesian ensemble refinement result in Eq. p6p . Series 
expansion shows that this limit is approached asymptot¬ 
ically as ln[q 7 v(a^)/p^°^*H 2 ^)] = f{x)/N -\- 0{1/N'^) with 
f{x) = (s^ — x^)/2{s^ + cr^) for fixed x, i.e., as 0(l/iV) 
in the error of the logarithm of the probability density. 

Importantly, if we had left out the factor N scaling the 
in the exponent of Eq. dSll), the mean would instead 
have been Ys^/{s^ + Na^). The mean would thus ap¬ 
proach zero, i.e., the value of the reference distribution, 
as the number of replicas is increased, N —)> oo, irrespec¬ 
tive of the uncertainty tr > 0. This result makes it clear 
that the in the replica model has to be scaled by N 
to obtain a result that is size-consistent in the number of 
replicas N. 

f. Ensemble reweighting of the second moment. An¬ 
other analytically tractable ensemble reweighting prob¬ 
lem is obtained for a non-linear observable, the second 
moment E = x^, again for the Gaussian reference dis¬ 
tribution in Eq. m- For the second moment, we have 
= [/ dxp(x)x^ The optimal solution then 

has to satisfy the integral equation 


6C 

5p{x) 


= e 


In 


Pjx) 

Poix) 


+ I 


x^ [/ dx' p{x')x'^ — Y] 
-^5-h A . 

(39) 


To solve this integral equation, we make a Gaus¬ 
sian ansatz with zero mean and variance t^, p(x) = 
exp[—x^/(2t^)]/(27rf^)^/^, with 9 set to one without loss 
of generality, and find 




2Ys‘^ -a‘^ + 


8a2s4 + (a2-2rs2)' 


ll/2 


4s^ 


(40) 


In the limit of no uncertainty in the measurement, cr —>■ 0, 
we find that t^ —>■ E, i.e., we have a Gaussian with ex¬ 
actly the measured second moment. In the other limit 
of complete uncertainty, tr —>■ oo, we have t^ = s^, i.e., 
no change relative to the reference distribution. In be¬ 
tween, t^ is a nonlinear interpolation between these two 
extremes. 

This problem is also analytically tractable for Bayesian 
replica ensemble refinement. For N replicas with 6 = j3 = 
1, the joint probability density is 

PAr(Xi, . . . ,XAr) OC (41) 


To integrate out X 2 to xat, we introduce (N — I)- 
dimensional spherical coordinates, with 


The marginalized distribution then becomes 

gN{x\)= j dx 2 dx 3 • ■ •dxjvPAr(xi,... ,XAr) (42) 

fY+xl 

r^+xl 


oc I dr ^ exp 


= J drf{r\xi) 


2s2 


As it turns out, the remaining one-dimensional integral 
can be carried out analytically, giving an expression in 
terms of confluent hypergeometric functions. However, 
to take the N ^ oo limit, it is advantageous to use a 
saddle-point approximation of the integrand in terms of 
a Gaussian, /(r|x) ~ /o exp[—(r —/r)2/2n], that becomes 
increasingly accurate as N increases. We find that the 
variance x in r becomes independent of N and x in the 
limit of large N, such that only the value /o = /(/r|x) at 
the extremum needs to be considered in the construction 
of the marginalized distribution of x. In the limit of 
fV —>• oo, /o depends on x as 

ln/o(x) ss ln/o(x = 0) - (43) 

where the x = 0 value cancels in the normalization of 
the marginalized distribution gjv(x). The marginalized 
distribution of x is thus a Gaussian centered at zero with 
variance as given in Eq. dini). The result of Bayesian 
replica ensemble refinement thus converges to the prob¬ 
ability density of optimal Bayesian ensemble refinement 
in the limit of large N. This correspondence once again 
stresses the importance of scaling the y^ by N to main¬ 
tain proper coupling and convergence in the limit of large 
numbers N of replicas. 

g. Convergence of Bayesian replica ensemble refine¬ 
ment. To examine the convergence of the Bayesian 
replica ensemble refinement with the number N of repli¬ 
cas towards the optimal Bayesian ensemble refinement 
solution, we have performed Metropolis Monte Carlo 
simulations for a one-dimensional system defined by a 
double-well potential energy function j3U{x) = 3[(x — 
a)^ — b'^]^/b'^ with a = (M -|- I)/2 and b = [M -\- 2)/4. 
We have discretized the potential at x = 1,2,...,M 
with M = 50. The mean of the corresponding refer¬ 
ence distribution pci{x) oc exp[—/3[/(x)] is at x = a = 
25.5. In the ensemble refinement, we set the target half¬ 
way between the maximum and the upper minimum, at 
E = (6 -I- 5M)/8 = 32. The resulting y^ then becomes 
y2 = (7V“^ Xa — E)^/(t^, with a set to one. Sys¬ 

tems with TV = 2,4,8,..., 128 replicas were sampled with 
Monte Carlo simulations, and the distributions qN^x) av¬ 
eraged over all replicas calculated. 

Figure m compares the resulting distributions gAr(x) of 
X from Bayesian replica ensemble refinement to p^°'^^\x) 
from optimal Bayesian ensemble refinement. We find 
that the ensemble reweighted distributions shift contri¬ 
butions from the left well to the right well to match the 
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FIG. 2. Optimal Bayesian ensemble refinement and Bayesian 
replica ensemble refinement for one-dimensional donble well 
system with restraint on the mean, as indicated by the ver¬ 
tical black line. (Top) Marginalized distributions qN{x) from 
Bayesian replica ensemble refinement with N replicas (lines) 
compared to the optimal Bayesian ensemble refinement soln- 
tion (open sqnares) and to the reference distribution 

Po{x) (thin line with open circles). Arrows indicate changes 
relative to po(*). (Bottom) Error A[lnqjv(3;) — lnp^°P*^(a;)] 
in lngjv(a;) scaled by the number of replicas N. Part of the 
scatter is a reflection of the stochastic Monte Carlo sampling 
of the Bayesian distributions qN{x). 


target mean, but by and large retain the shape within 
each well of the potential U{x) defining the reference 
distribution. The only exception is = 2, where the 
restraint on the mean effectively pulls one of the repli¬ 
cas out of the first minimum into the barrier region. We 
also find numerically that the distributions qNix), av¬ 
eraged over all replicas N, converge asymptotically (for 
large N) to the optimal Bayesian ensemble refinement 
solution p(°P*\x) as ln( 7 jv(a^) ~ lnp(°P*\x) -I- f{x)/N 
for N > 16. Numerical results for the master curve 
f{x) = iV ln[q 7 v(a:)/p^°P*^(x)] are shown in the bottom 
panel of Figure [5] bottom; the actual error in an N- 
replica simulation is approximately 1/iV-th of f{x). The 
probability density from Bayesian replica ensemble re¬ 
finement thus appears to converge asymptotically as 1 /N 
to the optimal Bayesian result, as in the first analytically 
tractable example above. 

h. Convergence of EROS. We have used the same 
model to examine the convergence of EROS in the case 
where n representative configurations are drawn accord¬ 
ing to pq{x) and then reweighted according to Eq. (l24l) . 
Specifically, we have drawn n values of x with replace¬ 
ment according to the Boltzmann distribution for the 
double-well potential with M = 50. The resulting n 


points, indexed as a(l),... ,a(n), were then reweighted 
according to Eq. (IMl) . with w^a{i) ~ *■ 

resulting EROS weights were then averaged for each of 
the M possible values of x, 

Pn{x = Xa) = , ( 44 ) 

where (• • •) indicates an average over repeated selections 
of samples of size n, and Sa^-y = 1 if a = 7 and zero oth¬ 
erwise. In this way, we estimated the expected weight of 
conhguration a in repeated EROS runs using n represen¬ 
tative ensembles. 

In Figure [3l we show that the distribution Pn{x = Xa) 
obtained by repeated reweighting indeed converges to the 
optimal Bayesian ensemble rehnement result in the limit 
of large n. For the specific example, the error in lnp„(x) 
scales as 1 /n. Interestingly, the relative error obtained for 
EROS samples of size n is comparable to that of Bayesian 
replica ensemble refinement with n replicas. 



FIG. 3. EROS and optimal Bayesian ensemble refinement 
for a one-dimensional double well system with restraint on 
the mean, as indicated by the vertical black line. (Top) 
Marginalized distributions Pn{x) from EROS with n config¬ 
urations drawn according to po (lines) compared to the op¬ 
timal Bayesian ensemble refinement solution p^°^^\x) (open 
squares). (Bottom) Error n[lnp„(a;) — lnp^°P*^(a;)] in lnp„(a;) 
scaled by the sample size n. Part of the scatter is a reflection 
of the stochastic Monte Carlo sampling of configurations in 
EROS. 

i. BioEn improves convergence by combining EROS 
and replica simulations. We have also tested the BioEn 
combination of EROS and replica simulations to speed up 
convergence to the optimal Bayesian ensemble distribu¬ 
tion. Eigure U demonstrates the dramatic improvement 
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achieved in the combined method for the model of Fig¬ 
ures [H and [31 After EROS reweighting of the configura¬ 
tions sampled in unbiased and biased runs with = 2, 4, 
and 8 replicas, we find that the significant systematic er¬ 
rors in the Bayesian replica ensemble distributions 
disappear, and only small, primarily statistical errors re¬ 
main. 



0 10 20 30 40 50 


X 

FIG. 4. BioEn method applied to double-well system. The 
bottom three panels show the error Agjv(a;) = qN{x) — 
p(opt)(j,) jjj ensemble distributions obtained from regu¬ 
lar replica ensemble refinement with A = 2, 4, and 8 replicas 
(purple line with symbols) relative to the optimal Bayesian en¬ 
semble distribution (shown in the top panel). Also 

shown is the error of the BioEn method combining EROS 
and replica refinement (green lines). Please note the change 
in scale of the vertical axes for different N. See Figure [2] 
for the error in lnqAr(a:) without BioEn for larger numbers of 
replicas N. 


B. Practical considerations 


j. Combining common-property and ensemble- 
average refinement. For simplicity, we have so far dealt 
separately with observables reporting on properties 
common to all configurations and on ensemble averages. 
However, these data can be combined readily within 
the above formalisms. In the respective posteriors, the 
likelihood terms according to Eqs. m and m simply 
have to be multiplied. The optimal Bayesian ensemble 
distribution then becomes 


p(opt)^x) cx po{x) exp 


-E 




(45) 




for m restraints on common properties, and M — m re¬ 
straints on ensemble averages. Equation (1351) is a com¬ 
bination of Eqs. (HU and dan. An analogous expression 
generalizes Eq. dm) for the discrete case. 

k. Data from single-molecule experiments. Data 
from single-molecule experiments can be incorporated in 
the different refinement procedures. In principle, one 
could even fit the data individually, one molecule at a 
time, using single-copy refinement. In a more practical 
approach, one can use the techniques of ensemble refine¬ 
ment to fit the single-molecule data lumped together in 
a way that produces not just averages but also distribu¬ 
tions of observables. An example are FRET efficiencies 
E measured by single-molecule spectroscopy. As a basis 
for ensemble refinement one can for instance deter¬ 
mine FRET-efhciency histograms 

from photon arrival trajectories and use the deviations 
between histogram counts calculated for an ensemble 
model. Hi = yi[p{x)], and measured in experiment, 
j^iohs) ^ ^(obs)^ construct a with appropriate error 
models. With such a y^, one can then use both EROS^d^ 
and Bayesian replica ensemble refinement. 

l. Dynamic, time-dependent data such as NMR 
NOEs. Many relevant observables are not just functions 
of a configuration, yi = yi{'x), but depend also on the dy¬ 
namics. Examples are the NOE intensity and other NMR 
relaxation parameters that depend on the rotational and 
translational dynamics of the spin systemA^ Whereas it 
is outside of the scope of this article to refine an entire dy¬ 
namical model to such data, we can make some progress 
in this direction by considering a reduced problem. Ig¬ 
noring self-consistency issues, we can attempt to refine 
an ensemble of configurations x that evolve in time un¬ 
der the Hamiltonian of the molecular simulation energy 
function 17(x) defining the reference distribution po(x), 
but are distributed according top(°P*^(x) instead ofpo(x). 

To calculate the observables associated with a partic¬ 
ular configuration x, one can use trajectory segments 
passing through x. Each of the sample configurations 
X would then serve as an initial value, with Maxwell- 
Boltzmann velocities, for one or multiple trajectory seg¬ 
ments of length t/2. To center the trajectories at x 
with respect to time, one can run trajectory pairs of 
length t/2, initiated from x with sign-inverted Maxwell- 
Boltzmann velocities, one running forward and the other 
running backward in time. Stitching the two segments 
together at x, after sign-inverting the velocities of the 
backward segment, one obtains a continuous trajectory of 
length T centered time-wise at x. For each of these trajec¬ 
tories, the time-dependent observable yi = 2 /i[x(t)|x( 0 ) = 
x; — r/2 <t< t/2] can be calculated, possibly averaged 
by repeated runs over different choices of initial Maxwell- 
Boltzmann velocities. The yt calculated in this manner 
can be treated as simple functions of x = x(0) to enter 


X exp 


M 

E 


yi(x) 
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the in the same way as static data. The trajectory 
length r should be set such that the yi can be calculated 
with reasonable accuracy (i.e., as multiples of the rel¬ 
evant correlation times). After refinement, one obtains 
an ensemble of configurations that jointly account for the 
time-dependent observables yet stay close to the reference 
distribution. We note that (possibly overlapping) trajec¬ 
tory segments could also be obtained from long equilib¬ 
rium trajectories, or even from A^-replica simulations. 

As the simplest approach of refining also the actual dy¬ 
namics, one can perform in addition time scaling, 1 1 —>■ at, 
which could for instance account for incorrect viscosities 
of the water model used in the molecular dynamics sim¬ 
ulations. The time-scale parameter a can then be opti¬ 
mized as well in the ensemble refinement. 

771. Solving the EROS equations. One can obtain the 
EROS weights in Eq. (j24ll by numerical minimization of 
C in Eq. (|23)l . which can be accomplished by a variety of 
techniques with and without gradient calculations. Al¬ 
ternatively, one can solve Eq. (l24l) directly, for instance 
by iteration until self-consistency is achieved. A possi¬ 
ble route is to start from the weights ~ in 

the reference distribution, and then iterate Eq. (IM)) to 
get an updated estimate of This procedure can 

be repeated until the change in old and new approxi¬ 
mations drops below a chosen threshold. We found that 
mixing the old and new approximations geometrically, as 

^ with 0 < a; < 1, led to stable fixed- 
point iterations. The mixing parameter x controls stabil¬ 
ity (x 1) and speed (x ~ 0). We further improved the 
stability and convergence behavior by starting at a large 
value of 0, where the deviations from the reference dis¬ 
tribution are small, and then reducing 0 in repeated 
fixed-point iterations to sweep out a broad 0 range. The 
resulting EROS refinements for different 0 can help us in 
the choice of 0, as discussed next. 

n. Choosing the confidence factor 0. The Bayesian 
ensemble refinement methods described here contain one 
free parameter, the factor 0 that enters the prior and 
quantifies the level of confidence one has in the refer¬ 
ence probability density po(x). Large values of 0 ex¬ 
press high confidence (for instance, if one uses a well- 
tested atomistic force field instead of a more approximate 
coarse-grained representation, both being well sampled). 
Whereas formally, one would choose 0 before refinement, 
in practice one may want to readjust this choice after 
the fact to achieve a better balance between reference 
distribution and data. By reporting the chosen 0 and 
the corresponding Kullback-Leibler divergence between 
reference and optimal distribution, the inference process 
becomes transparent. 

Akin to L-curve selection in other regularization ap¬ 
proaches to inverse problems;^ one can find an appropri¬ 
ate value of 0 by plotting the Kullback-Leibler divergence 
(relative entropy) 5'kl = - ) 

against the obtained in EROS reweighting for differ¬ 
ent values of 0. As discussed above, EROS reweight¬ 


ing can (and should) be performed even when Bayesian 
replica ensemble refinement is used to obtain the config¬ 
urations to avoid finite-A^ effects. The value of 
can then be interpreted as the average error in the energy 
function C/(x) used to define the reference distribution, 
since by definition Skl = P J dxp(°P‘)(x) [17 (°p*)(x) - 
U{x)] for p(°P*\x) oc exp[—/3{7^°P*^^ (x)], given that 
the additive constant in C/Opt) (|x) is chosen such that 
the partition functions (and thus free energies) of 
the optimal and reference distribution are identical, 
/ dxexp[—,817(°P*)(x)] = f dxexp[—/317(x)]. This allows 
one to choose a 0 value on the basis of expectations con¬ 
cerning the magnitude of this errorf^^ Conversely, one 
can also take a more pragmatic approach and choose a 
value of 0 at the kink of the S'xL-versus-y^ curve, where 
a further decrease in 0 does not produce a significant 
improvement in the fit quality but causes a large devia¬ 
tion from the reference distribution, as measured by S'kl- 
This approach is taken in the MERA web server for the 
refinement of peptide Ramachandran maps against NMR 
data i^°i^^ There one accepts a a certain percentage 
point (say, 25 %) above the minimal x^ obtained for 
6»Ri0. 

Finally, the confidence parameter 0 can also be treated 
as a nuisance parameter with an uninformative prior, 
p{0) oc 1/d for 0 > 0. One could include 0 in the max¬ 
imization of the posterior or attempt to integrate it out 
in a weighted average over ensembles obtained for fixed 
0 . 


IV. CONCLUSIONS 

We have described different Bayesian approaches to en¬ 
semble refinement, established their interrelations, and 
shown how they can be applied to experimental data. 
The Bayesian approaches allow one to integrate a wide 
variety of experiments, including experiments reporting 
on properties common to all configurations and on av¬ 
erages over the entire ensemble. We started from a 
Bayesian formulation in which the posterior is a func¬ 
tional that ranks the quality of the configurational dis¬ 
tributions. We then derived expressions for the opti¬ 
mal probability distribution in configuration space. For 
discrete configurations, we found that this optimal dis¬ 
tribution is identical to that obtained by the EROS 
modelji^ To perform ensemble refinement “on the fly”, 
or to enhance the sampling of relevant configurations 
in cases where the reference distribution and the opti¬ 
mal ensemble density have limited overlap, we considered 
replica-simulation methods in which a restraint is im¬ 
posed through a biasing potential that acts on averages 
over all replicas. We showed using a mean-field treatment 
that to obtain a size-consistent result, the biasing poten¬ 
tial has to be scaled by the number N of replicas, i.e., 
the restraint has to become stiffer as more replicas are in¬ 
cluded. Then, the Bayesian replica ensemble refinement 
converges to the optimal Bayesian ensemble refinement 
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in the limit of infinitely many replicas, TV —>■ oo. This 
result clarifies the need to scale the biasing potential, 
which arises also in maximnm entropy treatments with 
strict constraintswith the number of replicas N 
to obtain a size-consistent result. An adaptive method, 
as described in the Appendix, provides a possible alter¬ 
native to replica-based approaches. 

The BioEn approach combines the replica and EROS 
refinement methods. The replica simulations are used 
to create an enriched sample of configurations. A free- 
energy calculation is used to determine the appropri¬ 
ate weights according to the reference ensemble. The 
optimal weights according to Bayesian ensemble refine¬ 
ment are then determined by EROS. This combined 
approach addresses the shortcomings of either method, 
i.e., the need to work with relatively small N in replica 
simulations, and potentially limited overlap of reference 
and optimized distribution in EROS. Using free-energy 
reweighting methods, it may also be possible to include 
configurations from other types of ensemble-biased sim¬ 
ulations, including those designed to satisfy measure¬ 
ments exactly4i^— Because of the flexibility and ex¬ 
pected rapid convergence, the BioEn method combining 
Bayesian replica and EROS refinement should perform 
well in practical applications. 

In two examples that are analytically tractable and 
one requiring numerical calculations, we demonstrated 
the eqnivalence of the different methods in the appro¬ 
priate limits. We also studied the convergence proper¬ 
ties of Bayesian replica simulations with the number of 
replicas N, and of EROS reweighting with the sample 
size n. Our examples showed similar convergence of the 
log-probability of the two refinement approaches to the 
optimal limit as 1/7V and 1/n, respectively. 

The BioEn approach also addresses a major issue in 
Bayesian ensemble refinement, namely the choice of 9. 
This parameter enters the prior to express our confidence 
in the reference distribution. We find that in the optimal 
Bayesian ensemble distributions, a change in 9 is simply 
equivalent to a uniform scaling of all squared Gaussian 
errors af. Since EROS reweighting is usually orders of 
magnitudes less costly than sampling multiple replicas 
in coupled molecular simulations, one can efficiently ob¬ 
tain estimates of the relative entropy 5'kl for different 
9. Erom plots of S'kl against one can make an edu¬ 
cated choice of 0 , as in other regularization approaches 
to inverse problems)^ 

Finally, the reweighting of individual structures, ei¬ 
ther directly using EROS or in the combined approach, 
should prove useful in the optimization of potential en¬ 
ergy functions by fitting them to experimental data (see, 
e.g., Refs. |47 h 4^ . If one has a good understanding of 
the sources of the errors in the energy surface U (x), pa¬ 
rameters in U can be fitted directly, as was done, e.g., for 
the star force fields of proteins^ and for RNA.~ At the 
other extreme, Bayesian approaches have been used be¬ 
fore to infer entire energy functions/^-^ Here, we suggest 
to concentrate on the change in weight of structures x^, 


i.e.. In= —/3 A[/q + const., which defines the 
required change AUa in the potential energy to match 
experiment. By examining the correlation of this force 
field error AUa with elements of the force field (e.g., pep¬ 
tide dihedral angles^ or base stacking interactions^) , it 
might be possible to identify sources of the error and then 
correct for them. 

It is important to emphasize that a number of assump¬ 
tions enter the ensemble refinement procedure. The cen¬ 
tral (and declared!) assumption is that of a reference 
distribution. Here it may be possible to use combina¬ 
tions of multiple potential energy functions Ufe(x), repre¬ 
senting different force fields or conditions k, that jointly 
cover the relevant phase space better than any potential 
alone. One way to mix such potentials is by using a mul¬ 
tistate model^ U{x) = — 7 “^ In^^, exp[— 7 Ufc(x) + et], 
where 7 is the mixing “temperature” and Ck are energy 
offsets that weight the different force fields. Another im¬ 
portant challenge is that one has to estimate errors both 
in the measurements and in the calculation of the observ¬ 
ables. Procedures to account for uncertainties in the er¬ 
ror estimates have been developedj^ Within the present 
framework, one could include error distributions in the 
maximization of the log-posterior, or average over opti¬ 
mal solutions obtained for different errors. In addition, in 
many cases the Gaussian error model may not be appro¬ 
priate. As discussed, to handle more general error mod¬ 
els, one can substitute the log-likelihood lnP[data|p(x)] 
for —x^/2 in 7^[p(x)|data]. 

Overall, we expect our exploration of different 
Bayesian ensemble refinement approaches to serve both 
as a basis for practical applications and as a starting 
point for further investigations. In particular, we have 
here not considered an orthogonal refinement approach 
in which one seeks to represent the ensemble by a minimal 
set of structuresAs we had shown before, EROS 
and minimal ensemble refinement, properly interpreted, 
can give consistent resultsJ^ However, the relation of the 
different methods is not well understood, e.g., concerning 
the limiting behavior for large sample sizes. 
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Appendix A: Adaptive sampling of optimal Bayesian 
distribution without replicas 


We define probability densities of the observables alone 
by integrating out all other degrees of freedom, 


-fo(y) = f dxpo(x) - ?/i(x)], (Al) 

. M 

P{y) = / dxp(x) - ?/*(x)]. (A2) 

i=l 

According to Eqs. (I^Tl) and these two distributions 
are related to each other, 


M 


P{y) <x Po(y)exp -- ^ y,(s 


(A3) 




for possibly correlated Gaussian errors, where the gen¬ 
eralized forces fj have to be determined self-consistently 
such that 


/, = I dy P{y)y, - = (y,) - . (A4) 

As a consequence, p(x) oc po(x) exp[—0“^y^(x)S“^f] 
where superscript T indicates the transpose in vector- 
matrix notation. The biasing potential thus assumes a 
functional form linear in the ?/i(x), as seen in standard 
maximum entropy approaches (see, e.g.. Refs. E3.Ei and 
[30h . but the generalized forces fi take on different val¬ 
ues here. We also note that the forces fj defining 
the optimal distribution can be interpreted mechani¬ 
cally. With Eq. (IA3I) one finds that the mean force 
trying to “restore” the reference distribution, = 

f dyP(y)i9[—lnPo(y)]/5y = —is exactly bal¬ 
anced by the mean force to fit the data, = 

f dy P{y)d{x^/2)/dy = up to a factor 9, with 

from Eq. ([5]) with instead of yj°'°^^. 

Formally, the fj can be obtained by solving M 
coupled nonlinear equations. We define generat¬ 
ing functions ((io(z) = J dy Po{y) exp{y ■ z) = 

J dxpo{x)exp[J2,yx{^)zi] and (j){z) = f dy P(y) exp(y ■ 
z) = f c?xp(x) exp[^j j/i(x)zi], assuming that the inte¬ 
grals exist. Multiplying Eq. (IA3I) by exp(y • z) and inte¬ 
grating over y, we obtain 




(AS) 


where the denominator ensures normalization, 0(0) = 1. 
With (yj) = d(j){z)/dzj\^^Q, Eq. (IA4I1 for the vector of 
forces f becomes satisfy 


f = 


01 n0o [z-6l-i(S-i)f] 


dz 




(A6) 


z—0 


where ln0o(z) is the cumulant generating function of the 
reference distribution of observables. 


In cases where the equations cannot be solved directly, 
one can determine the force-vector f adaptively. In the 
following, we present a simple algorithm that can be 
combined with existing simulation procedures. This ap¬ 
proach is related to that of White and Voth;^ in which 
an adaptive gradient-based method is used to construct 
a distribution in which the yj-averages exactly match 
y^(ot>s)^ Here, by contrast, we include measurement er¬ 
rors and thus do not demand exact agreement with the 
observed values. Instead, the fj have to be determined 
self-consistently to satisfy Eq. (IA4I) . In our adaptive op¬ 
timization, we adjust the generalized forces fj “on the 
fly” according to the running averages of the yj , 

f(t) = i^‘dry[x(r)]-Y(°‘’^), (A7) 

with initial value f(0) = y[x(0)]. Here, the trajectory 
x(<) evolves according to the time-dependent potential 
energy U{x) + y(x)S~^f(t)/d. We note that by extend¬ 
ing the phase space to include both x and f, this algo¬ 
rithm can be cast in a Markovian form. If A/ is the Li- 
ouville evolution operator for the phase space density of 
X according to the molecular dynamics or Monte Carlo 
simulation protocol and potential U{x) + y(x)S~^f/6* 
with fixed f, then the extended phase space density 
p = p(x, f, t) satisfies a Markovian Liouville-type evo¬ 
lution equation 


dp 




(AS) 


Using this relation, one can show that for the Gaussian 
example in the main text, with the mean as observ¬ 
able and overdamped diffusion for the dynamics of x, 
the adaptive sampling is globally converging to the op¬ 
timal Bayesian ensemble distribution. For the example 
in Fig. [5J with Monte Carlo sampling of x, we observed 
convergence numerically. 

We note that in the adaptive determination of the gen¬ 
eralized forces fj defining the posterior distribution, vari¬ 
ants of Eq. (ITtI) are possible. In particular, one can av¬ 
erage between an initial guess fo and the evolving mean, 
e.g., as f(<) = u;(t)fo + [1 - w(t)][t"^dTy[x(T)] - 
Y(°bs)], is a weight function that decreases 

to zero with time, e.g., w{t) = exp{—t/to) for a suitably 
chosen relaxation time to • 
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